Genomewide Motif Identification Using a Dictionary Model
نویسندگان
چکیده
This paper surveys and extends models and algorithms for identifying binding sites in non-coding regions of DNA. These sites control the transcription of genes into messenger RNA in preparation for translation into proteins. We summarize the underlying biology, review three different models for binding site identification, and present a unified model that borrows from the previous models and integrates their main features. We then describe maximum likelihood and maximum a posteriori algorithms for fitting the unified model to data. Finally, we conclude with a prospectus of future data analyses and theoretical research.
منابع مشابه
“ Genomewide Motif Recognition with a Dictionary Model ”
Authors: AUTHORS: Chiara Sabatti and Kenneth Lange Bussemaker et al. (2000, PNAS) proposed the simple idea of modeling DNA non coding sequence as a concatenation of words and gave an algorithm to reconstruct deterministic words from an observed sequence. Moving from the same premises, we consider words that can be spelled in a variety of forms (hence accounting for varying degrees of conservati...
متن کاملFinding Genes by Hidden Markov Models with a Protein Motif Dictionary
A new method for combining protein motif dictionary to gene nding system is proposed. The system consists of Hidden Markov Models (HMMs) and a dictionary. The HMMs represents the nucleotide acid bases, the codons, and the amino acids. The 'words' in the dictionary is described by the sequence of these HMMs and represent the noncoding regions, the codons, protein motifs, tRNA regions and signals...
متن کاملA New Dictionary Construction Method in Sparse Representation Techniques for Target Detection in Hyperspectral Imagery
Hyperspectral data in Remote Sensing which have been gathered with efficient spectral resolution (about 10 nanometer) contain a plethora of spectral bands (roughly 200 bands). Since precious information about the spectral features of target materials can be extracted from these data, they have been used exclusively in hyperspectral target detection. One of the problem associated with the detect...
متن کاملMethods in Comparative Genomics: Genome Correspondence, Gene Identification and Regulatory Motif Discovery
In Kellis et al. (2003), we reported the genome sequences of S. paradoxus, S. mikatae, and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genomewide comparative analysis allowed the identification of functionally important sequences, both coding and noncoding. In this companion paper we describe the mathematical and algorithmic results underpinning the...
متن کاملNucleosome Occupancy Information Improves de novo Motif Discovery
A complete understanding of transcriptional regulatory processes in the cell requires identification of transcription factor binding sites on a genomewide scale. Unfortunately, these binding sites are typically short and degenerate, posing a significant statistical challenge: many more matches to known transcription factor binding sites occur in the genome than are actually functional. Chromati...
متن کامل